Overview

Dataset statistics

Number of variables25
Number of observations105542
Missing cells416
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.1 MiB
Average record size in memory200.0 B

Variable types

Numeric11
Text14

Alerts

article_id is highly overall correlated with product_codeHigh correlation
product_code is highly overall correlated with article_idHigh correlation
department_no is highly overall correlated with index_group_noHigh correlation
index_group_no is highly overall correlated with department_noHigh correlation
graphical_appearance_no is highly skewed (γ1 = -45.01901161)Skewed
article_id has unique valuesUnique

Reproduction

Analysis started2023-10-01 11:39:07.548497
Analysis finished2023-10-01 11:39:15.423440
Duration7.87 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

article_id
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct105542
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.9842457 × 108
Minimum1.0877502 × 108
Maximum9.59461 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum1.0877502 × 108
5-th percentile4.9381002 × 108
Q16.169925 × 108
median7.02213 × 108
Q37.96703 × 108
95-th percentile8.8937901 × 108
Maximum9.59461 × 108
Range8.5068599 × 108
Interquartile range (IQR)1.797105 × 108

Descriptive statistics

Standard deviation1.2846238 × 108
Coefficient of variation (CV)0.18393165
Kurtosis0.66097576
Mean6.9842457 × 108
Median Absolute Deviation (MAD)90074996
Skewness-0.57728335
Sum7.3713126 × 1013
Variance1.6502583 × 1016
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
108775015 1
 
< 0.1%
760158001 1
 
< 0.1%
760214002 1
 
< 0.1%
760208001 1
 
< 0.1%
760195006 1
 
< 0.1%
760195005 1
 
< 0.1%
760195004 1
 
< 0.1%
760195003 1
 
< 0.1%
760195002 1
 
< 0.1%
760195001 1
 
< 0.1%
Other values (105532) 105532
> 99.9%
ValueCountFrequency (%)
108775015 1
< 0.1%
108775044 1
< 0.1%
108775051 1
< 0.1%
110065001 1
< 0.1%
110065002 1
< 0.1%
110065011 1
< 0.1%
111565001 1
< 0.1%
111565003 1
< 0.1%
111586001 1
< 0.1%
111593001 1
< 0.1%
ValueCountFrequency (%)
959461001 1
< 0.1%
957375001 1
< 0.1%
956217002 1
< 0.1%
953763001 1
< 0.1%
953450001 1
< 0.1%
952938001 1
< 0.1%
952937003 1
< 0.1%
952267001 1
< 0.1%
950449002 1
< 0.1%
949594001 1
< 0.1%

product_code
Real number (ℝ)

HIGH CORRELATION 

Distinct47224
Distinct (%)44.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean698424.56
Minimum108775
Maximum959461
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum108775
5-th percentile493810
Q1616992.5
median702213
Q3796703
95-th percentile889379
Maximum959461
Range850686
Interquartile range (IQR)179710.5

Descriptive statistics

Standard deviation128462.38
Coefficient of variation (CV)0.18393165
Kurtosis0.66097587
Mean698424.56
Median Absolute Deviation (MAD)90075
Skewness-0.57728339
Sum7.3713125 × 1010
Variance1.6502584 × 1010
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
783707 75
 
0.1%
684021 70
 
0.1%
699923 52
 
< 0.1%
699755 49
 
< 0.1%
685604 46
 
< 0.1%
739659 44
 
< 0.1%
664074 41
 
< 0.1%
570002 41
 
< 0.1%
562245 41
 
< 0.1%
685816 41
 
< 0.1%
Other values (47214) 105042
99.5%
ValueCountFrequency (%)
108775 3
< 0.1%
110065 3
< 0.1%
111565 2
 
< 0.1%
111586 1
 
< 0.1%
111593 1
 
< 0.1%
111609 1
 
< 0.1%
112679 2
 
< 0.1%
114428 2
 
< 0.1%
116379 1
 
< 0.1%
118458 7
< 0.1%
ValueCountFrequency (%)
959461 1
< 0.1%
957375 1
< 0.1%
956217 1
< 0.1%
953763 1
< 0.1%
953450 1
< 0.1%
952938 1
< 0.1%
952937 1
< 0.1%
952267 1
< 0.1%
950449 1
< 0.1%
949594 1
< 0.1%
Distinct45875
Distinct (%)43.5%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length30
Median length23
Mean length15.535569
Min length1

Characters and Unicode

Total characters1639655
Distinct characters91
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22920 ?
Unique (%)21.7%

Sample

1st rowStrap top
2nd rowStrap top
3rd rowStrap top (1)
4th rowOP T-shirt (Idro)
5th rowOP T-shirt (Idro)
ValueCountFrequency (%)
dress 7825
 
2.6%
tee 4553
 
1.5%
top 3938
 
1.3%
shorts 3555
 
1.2%
fancy 2796
 
0.9%
ls 2336
 
0.8%
hood 2294
 
0.8%
sb 2252
 
0.8%
set 2133
 
0.7%
1 2043
 
0.7%
Other values (13649) 261891
88.6%

Most occurring characters

ValueCountFrequency (%)
190600
 
11.6%
e 116144
 
7.1%
a 94570
 
5.8%
s 79849
 
4.9%
r 78145
 
4.8%
i 76131
 
4.6%
o 67798
 
4.1%
n 65393
 
4.0%
t 63950
 
3.9%
l 58420
 
3.6%
Other values (81) 748655
45.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 988791
60.3%
Uppercase Letter 418961
25.6%
Space Separator 190600
 
11.6%
Decimal Number 18512
 
1.1%
Dash Punctuation 7701
 
0.5%
Other Punctuation 6650
 
0.4%
Open Punctuation 3937
 
0.2%
Close Punctuation 3914
 
0.2%
Math Symbol 537
 
< 0.1%
Connector Punctuation 30
 
< 0.1%
Other values (2) 22
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 116144
11.7%
a 94570
 
9.6%
s 79849
 
8.1%
r 78145
 
7.9%
i 76131
 
7.7%
o 67798
 
6.9%
n 65393
 
6.6%
t 63950
 
6.5%
l 58420
 
5.9%
d 33369
 
3.4%
Other values (23) 255022
25.8%
Uppercase Letter
ValueCountFrequency (%)
S 45068
 
10.8%
E 32036
 
7.6%
A 27062
 
6.5%
T 26995
 
6.4%
L 25542
 
6.1%
P 24833
 
5.9%
B 23677
 
5.7%
C 22072
 
5.3%
R 21928
 
5.2%
I 19528
 
4.7%
Other values (23) 150220
35.9%
Decimal Number
ValueCountFrequency (%)
2 5117
27.6%
1 3776
20.4%
3 2769
15.0%
5 2213
12.0%
9 2071
11.2%
7 853
 
4.6%
4 516
 
2.8%
0 465
 
2.5%
8 437
 
2.4%
6 295
 
1.6%
Other Punctuation
ValueCountFrequency (%)
. 3215
48.3%
/ 2881
43.3%
& 273
 
4.1%
: 181
 
2.7%
! 44
 
0.7%
' 40
 
0.6%
? 16
 
0.2%
Space Separator
ValueCountFrequency (%)
190600
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7701
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3937
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3914
100.0%
Math Symbol
ValueCountFrequency (%)
+ 537
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 30
100.0%
Modifier Symbol
ValueCountFrequency (%)
^ 21
100.0%
Other Symbol
ValueCountFrequency (%)
© 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1407752
85.9%
Common 231903
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 116144
 
8.3%
a 94570
 
6.7%
s 79849
 
5.7%
r 78145
 
5.6%
i 76131
 
5.4%
o 67798
 
4.8%
n 65393
 
4.6%
t 63950
 
4.5%
l 58420
 
4.1%
S 45068
 
3.2%
Other values (56) 662284
47.0%
Common
ValueCountFrequency (%)
190600
82.2%
- 7701
 
3.3%
2 5117
 
2.2%
( 3937
 
1.7%
) 3914
 
1.7%
1 3776
 
1.6%
. 3215
 
1.4%
/ 2881
 
1.2%
3 2769
 
1.2%
5 2213
 
1.0%
Other values (15) 5780
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1639490
> 99.9%
None 165
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
190600
 
11.6%
e 116144
 
7.1%
a 94570
 
5.8%
s 79849
 
4.9%
r 78145
 
4.8%
i 76131
 
4.6%
o 67798
 
4.1%
n 65393
 
4.0%
t 63950
 
3.9%
l 58420
 
3.6%
Other values (66) 748490
45.7%
None
ValueCountFrequency (%)
ö 41
24.8%
é 35
21.2%
É 16
 
9.7%
Ö 13
 
7.9%
å 12
 
7.3%
ä 9
 
5.5%
è 9
 
5.5%
í 7
 
4.2%
Ä 7
 
4.2%
È 6
 
3.6%
Other values (5) 10
 
6.1%

product_type_no
Real number (ℝ)

Distinct132
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean234.86187
Minimum-1
Maximum762
Zeros0
Zeros (%)0.0%
Negative121
Negative (%)0.1%
Memory size824.7 KiB

Quantile statistics

Minimum-1
5-th percentile70
Q1252
median259
Q3272
95-th percentile304
Maximum762
Range763
Interquartile range (IQR)20

Descriptive statistics

Standard deviation75.049308
Coefficient of variation (CV)0.31954658
Kurtosis1.1655822
Mean234.86187
Median Absolute Deviation (MAD)13
Skewness-1.4230313
Sum24787792
Variance5632.3986
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
272 11169
 
10.6%
265 10362
 
9.8%
252 9302
 
8.8%
255 7904
 
7.5%
254 4155
 
3.9%
258 3979
 
3.8%
262 3940
 
3.7%
274 3939
 
3.7%
259 3405
 
3.2%
253 2991
 
2.8%
Other values (122) 44396
42.1%
ValueCountFrequency (%)
-1 121
 
0.1%
49 48
 
< 0.1%
57 662
0.6%
59 1307
1.2%
60 50
 
< 0.1%
66 1280
1.2%
67 458
 
0.4%
68 180
 
0.2%
69 573
0.5%
70 1159
1.1%
ValueCountFrequency (%)
762 3
 
< 0.1%
761 5
 
< 0.1%
532 3
 
< 0.1%
529 4
 
< 0.1%
525 1
 
< 0.1%
523 2
 
< 0.1%
521 7
 
< 0.1%
515 6
 
< 0.1%
514 1
 
< 0.1%
512 24
< 0.1%
Distinct131
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length24
Median length19
Mean length7.5308787
Min length3

Characters and Unicode

Total characters794824
Distinct characters51
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowVest top
2nd rowVest top
3rd rowVest top
4th rowBra
5th rowBra
ValueCountFrequency (%)
trousers 11299
 
9.2%
dress 10362
 
8.5%
sweater 9302
 
7.6%
top 8142
 
6.7%
t-shirt 7904
 
6.5%
bottom 4275
 
3.5%
blouse 3979
 
3.3%
jacket 3940
 
3.2%
shorts 3939
 
3.2%
shirt 3854
 
3.1%
Other values (140) 55357
45.2%

Most occurring characters

ValueCountFrequency (%)
e 87702
 
11.0%
r 86934
 
10.9%
s 86754
 
10.9%
t 65600
 
8.3%
o 51144
 
6.4%
a 50695
 
6.4%
i 40748
 
5.1%
S 29213
 
3.7%
T 25917
 
3.3%
u 23025
 
2.9%
Other values (41) 247092
31.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 652636
82.1%
Uppercase Letter 110870
 
13.9%
Space Separator 16811
 
2.1%
Dash Punctuation 7910
 
1.0%
Other Punctuation 6597
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 87702
13.4%
r 86934
13.3%
s 86754
13.3%
t 65600
10.1%
o 51144
7.8%
a 50695
7.8%
i 40748
 
6.2%
u 23025
 
3.5%
h 20431
 
3.1%
n 17817
 
2.7%
Other values (15) 121786
18.7%
Uppercase Letter
ValueCountFrequency (%)
S 29213
26.3%
T 25917
23.4%
B 12500
11.3%
D 10698
 
9.6%
H 5688
 
5.1%
J 5117
 
4.6%
U 3788
 
3.4%
P 3513
 
3.2%
V 2991
 
2.7%
C 2696
 
2.4%
Other values (12) 8749
 
7.9%
Other Punctuation
ValueCountFrequency (%)
/ 6594
> 99.9%
. 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
16811
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 763506
96.1%
Common 31318
 
3.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 87702
11.5%
r 86934
11.4%
s 86754
11.4%
t 65600
 
8.6%
o 51144
 
6.7%
a 50695
 
6.6%
i 40748
 
5.3%
S 29213
 
3.8%
T 25917
 
3.4%
u 23025
 
3.0%
Other values (37) 215774
28.3%
Common
ValueCountFrequency (%)
16811
53.7%
- 7910
25.3%
/ 6594
 
21.1%
. 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 794824
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 87702
 
11.0%
r 86934
 
10.9%
s 86754
 
10.9%
t 65600
 
8.3%
o 51144
 
6.4%
a 50695
 
6.4%
i 40748
 
5.1%
S 29213
 
3.7%
T 25917
 
3.3%
u 23025
 
2.9%
Other values (41) 247092
31.1%
Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length21
Median length18
Mean length15.44064
Min length3

Characters and Unicode

Total characters1629636
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGarment Upper body
2nd rowGarment Upper body
3rd rowGarment Upper body
4th rowUnderwear
5th rowUnderwear
ValueCountFrequency (%)
garment 75854
28.9%
body 75845
28.9%
upper 42741
16.3%
lower 19812
 
7.6%
full 13292
 
5.1%
accessories 11158
 
4.3%
underwear 5490
 
2.1%
shoes 5283
 
2.0%
swimwear 3127
 
1.2%
socks 2442
 
0.9%
Other values (16) 7102
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 182285
 
11.2%
r 165779
 
10.2%
156604
 
9.6%
o 114727
 
7.0%
a 86526
 
5.3%
p 85482
 
5.2%
n 81847
 
5.0%
d 81398
 
5.0%
t 80347
 
4.9%
m 79047
 
4.9%
Other values (25) 515594
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1286698
79.0%
Uppercase Letter 183838
 
11.3%
Space Separator 156604
 
9.6%
Other Punctuation 2496
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 182285
14.2%
r 165779
12.9%
o 114727
8.9%
a 86526
 
6.7%
p 85482
 
6.6%
n 81847
 
6.4%
d 81398
 
6.3%
t 80347
 
6.2%
m 79047
 
6.1%
y 75850
 
5.9%
Other values (11) 253410
19.7%
Uppercase Letter
ValueCountFrequency (%)
G 75854
41.3%
U 48406
26.3%
L 19812
 
10.8%
F 13307
 
7.2%
A 11158
 
6.1%
S 10866
 
5.9%
T 2442
 
1.3%
N 1899
 
1.0%
C 49
 
< 0.1%
B 25
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
& 2442
97.8%
/ 54
 
2.2%
Space Separator
ValueCountFrequency (%)
156604
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1470536
90.2%
Common 159100
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 182285
12.4%
r 165779
 
11.3%
o 114727
 
7.8%
a 86526
 
5.9%
p 85482
 
5.8%
n 81847
 
5.6%
d 81398
 
5.5%
t 80347
 
5.5%
m 79047
 
5.4%
G 75854
 
5.2%
Other values (22) 437244
29.7%
Common
ValueCountFrequency (%)
156604
98.4%
& 2442
 
1.5%
/ 54
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1629636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 182285
 
11.2%
r 165779
 
10.2%
156604
 
9.6%
o 114727
 
7.0%
a 86526
 
5.3%
p 85482
 
5.2%
n 81847
 
5.0%
d 81398
 
5.0%
t 80347
 
4.9%
m 79047
 
4.9%
Other values (25) 515594
31.6%

graphical_appearance_no
Real number (ℝ)

SKEWED 

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1009515.1
Minimum-1
Maximum1010029
Zeros0
Zeros (%)0.0%
Negative52
Negative (%)< 0.1%
Memory size824.7 KiB

Quantile statistics

Minimum-1
5-th percentile1010001
Q11010008
median1010016
Q31010016
95-th percentile1010023
Maximum1010029
Range1010030
Interquartile range (IQR)8

Descriptive statistics

Standard deviation22413.586
Coefficient of variation (CV)0.022202329
Kurtosis2024.75
Mean1009515.1
Median Absolute Deviation (MAD)1
Skewness-45.019012
Sum1.0654624 × 1011
Variance5.0236883 × 108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1010016 49747
47.1%
1010001 17165
 
16.3%
1010010 5938
 
5.6%
1010017 4990
 
4.7%
1010023 4842
 
4.6%
1010008 3215
 
3.0%
1010014 3098
 
2.9%
1010004 2178
 
2.1%
1010005 1830
 
1.7%
1010021 1513
 
1.4%
Other values (20) 11026
 
10.4%
ValueCountFrequency (%)
-1 52
 
< 0.1%
1010001 17165
16.3%
1010002 1341
 
1.3%
1010003 15
 
< 0.1%
1010004 2178
 
2.1%
1010005 1830
 
1.7%
1010006 681
 
0.6%
1010007 1165
 
1.1%
1010008 3215
 
3.0%
1010009 958
 
0.9%
ValueCountFrequency (%)
1010029 8
 
< 0.1%
1010028 86
 
0.1%
1010027 66
 
0.1%
1010026 1502
 
1.4%
1010025 153
 
0.1%
1010024 322
 
0.3%
1010023 4842
4.6%
1010022 830
 
0.8%
1010021 1513
 
1.4%
1010020 376
 
0.4%
Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length19
Median length5
Mean length8.2858578
Min length3

Characters and Unicode

Total characters874506
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSolid
2nd rowSolid
3rd rowStripe
4th rowSolid
5th rowSolid
ValueCountFrequency (%)
solid 49747
32.9%
pattern 17680
 
11.7%
all 17165
 
11.4%
over 17165
 
11.4%
print 6313
 
4.2%
melange 5938
 
3.9%
stripe 4990
 
3.3%
denim 4842
 
3.2%
front 3215
 
2.1%
placement 3098
 
2.0%
Other values (25) 21011
13.9%

Most occurring characters

ValueCountFrequency (%)
l 102988
11.8%
o 80380
 
9.2%
e 77881
 
8.9%
i 77859
 
8.9%
t 67513
 
7.7%
r 62943
 
7.2%
S 55696
 
6.4%
d 54006
 
6.2%
n 48443
 
5.5%
45622
 
5.2%
Other values (32) 201175
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 716271
81.9%
Uppercase Letter 107841
 
12.3%
Space Separator 45622
 
5.2%
Other Punctuation 3431
 
0.4%
Decimal Number 1341
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 102988
14.4%
o 80380
11.2%
e 77881
10.9%
i 77859
10.9%
t 67513
9.4%
r 62943
8.8%
d 54006
7.5%
n 48443
6.8%
a 35452
 
4.9%
p 32949
 
4.6%
Other values (13) 75857
10.6%
Uppercase Letter
ValueCountFrequency (%)
S 55696
51.6%
A 18521
 
17.2%
M 8460
 
7.8%
D 6864
 
6.4%
C 4706
 
4.4%
F 3215
 
3.0%
P 3098
 
2.9%
O 2017
 
1.9%
L 1513
 
1.4%
E 1165
 
1.1%
Other values (6) 2586
 
2.4%
Space Separator
ValueCountFrequency (%)
45622
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3431
100.0%
Decimal Number
ValueCountFrequency (%)
3 1341
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 824112
94.2%
Common 50394
 
5.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 102988
12.5%
o 80380
9.8%
e 77881
9.5%
i 77859
9.4%
t 67513
8.2%
r 62943
 
7.6%
S 55696
 
6.8%
d 54006
 
6.6%
n 48443
 
5.9%
a 35452
 
4.3%
Other values (29) 160951
19.5%
Common
ValueCountFrequency (%)
45622
90.5%
/ 3431
 
6.8%
3 1341
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 874506
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 102988
11.8%
o 80380
 
9.2%
e 77881
 
8.9%
i 77859
 
8.9%
t 67513
 
7.7%
r 62943
 
7.2%
S 55696
 
6.4%
d 54006
 
6.2%
n 48443
 
5.5%
45622
 
5.2%
Other values (32) 201175
23.0%

colour_group_code
Real number (ℝ)

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.233822
Minimum-1
Maximum93
Zeros0
Zeros (%)0.0%
Negative28
Negative (%)< 0.1%
Memory size824.7 KiB

Quantile statistics

Minimum-1
5-th percentile7
Q19
median14
Q352
95-th percentile81
Maximum93
Range94
Interquartile range (IQR)43

Descriptive statistics

Standard deviation28.086154
Coefficient of variation (CV)0.87132561
Kurtosis-1.0610471
Mean32.233822
Median Absolute Deviation (MAD)7
Skewness0.7138227
Sum3402022
Variance788.83205
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9 22670
21.5%
73 12171
 
11.5%
10 9542
 
9.0%
51 5811
 
5.5%
7 4487
 
4.3%
12 3356
 
3.2%
72 3308
 
3.1%
42 3056
 
2.9%
71 3012
 
2.9%
19 2767
 
2.6%
Other values (40) 35362
33.5%
ValueCountFrequency (%)
-1 28
 
< 0.1%
1 105
 
0.1%
2 31
 
< 0.1%
3 709
 
0.7%
4 94
 
0.1%
5 1377
 
1.3%
6 2105
 
2.0%
7 4487
 
4.3%
8 2731
 
2.6%
9 22670
21.5%
ValueCountFrequency (%)
93 2106
 
2.0%
92 815
 
0.8%
91 681
 
0.6%
90 129
 
0.1%
83 473
 
0.4%
82 435
 
0.4%
81 1027
 
1.0%
80 14
 
< 0.1%
73 12171
11.5%
72 3308
 
3.1%
Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length15
Median length14
Mean length7.4805101
Min length3

Characters and Unicode

Total characters789508
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlack
2nd rowWhite
3rd rowOff White
4th rowBlack
5th rowWhite
ValueCountFrequency (%)
dark 23498
15.0%
black 22670
14.4%
light 19334
12.3%
blue 18542
11.8%
white 12268
7.8%
pink 9442
 
6.0%
grey 9323
 
5.9%
beige 7378
 
4.7%
red 5795
 
3.7%
green 3731
 
2.4%
Other values (16) 25065
16.0%

Most occurring characters

ValueCountFrequency (%)
e 87703
 
11.1%
k 58405
 
7.4%
i 58311
 
7.4%
l 54192
 
6.9%
a 52335
 
6.6%
51504
 
6.5%
B 50155
 
6.4%
r 49945
 
6.3%
h 40420
 
5.1%
t 33220
 
4.2%
Other values (28) 253318
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 580770
73.6%
Uppercase Letter 157140
 
19.9%
Space Separator 51504
 
6.5%
Other Punctuation 94
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 87703
15.1%
k 58405
10.1%
i 58311
10.0%
l 54192
9.3%
a 52335
9.0%
r 49945
8.6%
h 40420
7.0%
t 33220
 
5.7%
g 30050
 
5.2%
u 23536
 
4.1%
Other values (12) 92653
16.0%
Uppercase Letter
ValueCountFrequency (%)
B 50155
31.9%
D 23498
15.0%
L 19334
 
12.3%
G 17424
 
11.1%
W 12268
 
7.8%
P 10538
 
6.7%
O 7651
 
4.9%
R 5795
 
3.7%
Y 4899
 
3.1%
K 2767
 
1.8%
Other values (4) 2811
 
1.8%
Space Separator
ValueCountFrequency (%)
51504
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 94
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 737910
93.5%
Common 51598
 
6.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 87703
 
11.9%
k 58405
 
7.9%
i 58311
 
7.9%
l 54192
 
7.3%
a 52335
 
7.1%
B 50155
 
6.8%
r 49945
 
6.8%
h 40420
 
5.5%
t 33220
 
4.5%
g 30050
 
4.1%
Other values (26) 223174
30.2%
Common
ValueCountFrequency (%)
51504
99.8%
/ 94
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 789508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 87703
 
11.1%
k 58405
 
7.4%
i 58311
 
7.4%
l 54192
 
6.9%
a 52335
 
6.6%
51504
 
6.5%
B 50155
 
6.4%
r 49945
 
6.3%
h 40420
 
5.1%
t 33220
 
4.2%
Other values (28) 253318
32.1%

perceived_colour_value_id
Real number (ℝ)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2061833
Minimum-1
Maximum7
Zeros0
Zeros (%)0.0%
Negative28
Negative (%)< 0.1%
Memory size824.7 KiB

Quantile statistics

Minimum-1
5-th percentile1
Q12
median4
Q34
95-th percentile7
Maximum7
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.5638389
Coefficient of variation (CV)0.48775718
Kurtosis-0.094881985
Mean3.2061833
Median Absolute Deviation (MAD)1
Skewness0.27399945
Sum338387
Variance2.4455922
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
4 42706
40.5%
1 22152
21.0%
3 15739
 
14.9%
2 12630
 
12.0%
5 6471
 
6.1%
7 5711
 
5.4%
6 105
 
0.1%
-1 28
 
< 0.1%
ValueCountFrequency (%)
-1 28
 
< 0.1%
1 22152
21.0%
2 12630
 
12.0%
3 15739
 
14.9%
4 42706
40.5%
5 6471
 
6.1%
6 105
 
0.1%
7 5711
 
5.4%
ValueCountFrequency (%)
7 5711
 
5.4%
6 105
 
0.1%
5 6471
 
6.1%
4 42706
40.5%
3 15739
 
14.9%
2 12630
 
12.0%
1 22152
21.0%
-1 28
 
< 0.1%
Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length12
Median length11
Mean length6.8123022
Min length4

Characters and Unicode

Total characters718984
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDark
2nd rowLight
3rd rowDusty Light
4th rowDark
5th rowLight
ValueCountFrequency (%)
dark 42706
30.4%
light 37891
27.0%
dusty 34782
24.8%
medium 18341
13.1%
bright 6471
 
4.6%
undefined 105
 
0.1%
unknown 28
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
t 79144
11.0%
D 77488
10.8%
i 62808
 
8.7%
u 53123
 
7.4%
r 49177
 
6.8%
h 44362
 
6.2%
g 44362
 
6.2%
k 42734
 
5.9%
a 42706
 
5.9%
L 37891
 
5.3%
Other values (13) 185189
25.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 543878
75.6%
Uppercase Letter 140324
 
19.5%
Space Separator 34782
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 79144
14.6%
i 62808
11.5%
u 53123
9.8%
r 49177
9.0%
h 44362
8.2%
g 44362
8.2%
k 42734
7.9%
a 42706
7.9%
y 34782
6.4%
s 34782
6.4%
Other values (7) 55898
10.3%
Uppercase Letter
ValueCountFrequency (%)
D 77488
55.2%
L 37891
27.0%
M 18341
 
13.1%
B 6471
 
4.6%
U 133
 
0.1%
Space Separator
ValueCountFrequency (%)
34782
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 684202
95.2%
Common 34782
 
4.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 79144
11.6%
D 77488
11.3%
i 62808
9.2%
u 53123
 
7.8%
r 49177
 
7.2%
h 44362
 
6.5%
g 44362
 
6.5%
k 42734
 
6.2%
a 42706
 
6.2%
L 37891
 
5.5%
Other values (12) 150407
22.0%
Common
ValueCountFrequency (%)
34782
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 718984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 79144
11.0%
D 77488
10.8%
i 62808
 
8.7%
u 53123
 
7.4%
r 49177
 
6.8%
h 44362
 
6.2%
g 44362
 
6.2%
k 42734
 
5.9%
a 42706
 
5.9%
L 37891
 
5.3%
Other values (13) 185189
25.8%
Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.8079722
Minimum-1
Maximum20
Zeros0
Zeros (%)0.0%
Negative685
Negative (%)0.6%
Memory size824.7 KiB

Quantile statistics

Minimum-1
5-th percentile2
Q14
median5
Q311
95-th percentile19
Maximum20
Range21
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.376727
Coefficient of variation (CV)0.68862015
Kurtosis-0.36204043
Mean7.8079722
Median Absolute Deviation (MAD)3
Skewness0.80137952
Sum824069
Variance28.909193
MonotonicityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
5 22585
21.4%
2 18469
17.5%
9 12665
12.0%
4 9403
8.9%
12 8924
 
8.5%
18 5878
 
5.6%
11 5657
 
5.4%
19 3526
 
3.3%
20 3181
 
3.0%
8 3121
 
3.0%
Other values (10) 12133
11.5%
ValueCountFrequency (%)
-1 685
 
0.6%
1 1223
 
1.2%
2 18469
17.5%
3 2734
 
2.6%
4 9403
8.9%
5 22585
21.4%
6 1100
 
1.0%
7 1829
 
1.7%
8 3121
 
3.0%
9 12665
12.0%
ValueCountFrequency (%)
20 3181
 
3.0%
19 3526
 
3.3%
18 5878
5.6%
16 3
 
< 0.1%
15 2180
 
2.1%
14 105
 
0.1%
13 2269
 
2.1%
12 8924
8.5%
11 5657
5.4%
10 5
 
< 0.1%
Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length15
Median length12
Mean length4.9246082
Min length3

Characters and Unicode

Total characters519753
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlack
2nd rowWhite
3rd rowWhite
4th rowBlack
5th rowWhite
ValueCountFrequency (%)
black 22585
20.6%
blue 18469
16.8%
white 12665
11.5%
pink 9403
8.6%
grey 8924
 
8.1%
green 6715
 
6.1%
red 5878
 
5.4%
beige 5657
 
5.2%
khaki 3181
 
2.9%
yellow 3121
 
2.8%
Other values (11) 13233
12.0%

Most occurring characters

ValueCountFrequency (%)
e 83082
16.0%
l 52912
 
10.2%
B 48983
 
9.4%
k 35854
 
6.9%
i 33948
 
6.5%
a 31780
 
6.1%
c 23685
 
4.6%
r 23571
 
4.5%
n 23386
 
4.5%
u 23335
 
4.5%
Other values (23) 139217
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 408919
78.7%
Uppercase Letter 106545
 
20.5%
Space Separator 4289
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 83082
20.3%
l 52912
12.9%
k 35854
8.8%
i 33948
8.3%
a 31780
 
7.8%
c 23685
 
5.8%
r 23571
 
5.8%
n 23386
 
5.7%
u 23335
 
5.7%
h 15854
 
3.9%
Other values (10) 61512
15.0%
Uppercase Letter
ValueCountFrequency (%)
B 48983
46.0%
W 12665
 
11.9%
G 12458
 
11.7%
P 10503
 
9.9%
R 5878
 
5.5%
M 3403
 
3.2%
K 3181
 
3.0%
Y 3126
 
2.9%
O 2734
 
2.6%
T 1829
 
1.7%
Other values (2) 1785
 
1.7%
Space Separator
ValueCountFrequency (%)
4289
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 515464
99.2%
Common 4289
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 83082
16.1%
l 52912
 
10.3%
B 48983
 
9.5%
k 35854
 
7.0%
i 33948
 
6.6%
a 31780
 
6.2%
c 23685
 
4.6%
r 23571
 
4.6%
n 23386
 
4.5%
u 23335
 
4.5%
Other values (22) 134928
26.2%
Common
ValueCountFrequency (%)
4289
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 519753
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 83082
16.0%
l 52912
 
10.2%
B 48983
 
9.4%
k 35854
 
6.9%
i 33948
 
6.5%
a 31780
 
6.1%
c 23685
 
4.6%
r 23571
 
4.5%
n 23386
 
4.5%
u 23335
 
4.5%
Other values (23) 139217
26.8%

department_no
Real number (ℝ)

HIGH CORRELATION 

Distinct299
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4532.7778
Minimum1201
Maximum9989
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum1201
5-th percentile1338
Q11676
median4222
Q37389
95-th percentile8748
Maximum9989
Range8788
Interquartile range (IQR)5713

Descriptive statistics

Standard deviation2712.692
Coefficient of variation (CV)0.59846128
Kurtosis-1.3964267
Mean4532.7778
Median Absolute Deviation (MAD)2556
Skewness0.27135387
Sum4.7839844 × 108
Variance7358697.9
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7616 2032
 
1.9%
1338 1921
 
1.8%
8716 1874
 
1.8%
4242 1839
 
1.7%
7648 1488
 
1.4%
1640 1429
 
1.4%
1636 1402
 
1.3%
1676 1359
 
1.3%
1344 1354
 
1.3%
1643 1339
 
1.3%
Other values (289) 89505
84.8%
ValueCountFrequency (%)
1201 829
0.8%
1202 16
 
< 0.1%
1212 299
 
0.3%
1222 238
 
0.2%
1241 87
 
0.1%
1244 667
0.6%
1310 251
 
0.2%
1313 630
0.6%
1322 1206
1.1%
1334 864
0.8%
ValueCountFrequency (%)
9989 122
 
0.1%
9986 513
0.5%
9985 579
0.5%
9984 236
0.2%
9020 33
 
< 0.1%
8956 363
0.3%
8917 421
0.4%
8888 269
0.3%
8852 281
0.3%
8815 21
 
< 0.1%
Distinct250
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length40
Median length26
Mean length13.140219
Min length2

Characters and Unicode

Total characters1386845
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowJersey Basic
2nd rowJersey Basic
3rd rowJersey Basic
4th rowClean Lingerie
5th rowClean Lingerie
ValueCountFrequency (%)
jersey 24170
 
10.5%
girl 16349
 
7.1%
kids 14307
 
6.2%
fancy 13087
 
5.7%
boy 11674
 
5.1%
young 10428
 
4.5%
baby 7973
 
3.5%
knitwear 7498
 
3.2%
basic 7078
 
3.1%
woven 6640
 
2.9%
Other values (132) 111638
48.4%

Most occurring characters

ValueCountFrequency (%)
e 142984
 
10.3%
s 126069
 
9.1%
125300
 
9.0%
r 110268
 
8.0%
i 87155
 
6.3%
o 77105
 
5.6%
a 65051
 
4.7%
y 61342
 
4.4%
n 54902
 
4.0%
c 42943
 
3.1%
Other values (50) 493726
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1018643
73.5%
Uppercase Letter 229916
 
16.6%
Space Separator 125300
 
9.0%
Other Punctuation 9941
 
0.7%
Decimal Number 2079
 
0.1%
Math Symbol 615
 
< 0.1%
Dash Punctuation 351
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 142984
14.0%
s 126069
12.4%
r 110268
10.8%
i 87155
8.6%
o 77105
 
7.6%
a 65051
 
6.4%
y 61342
 
6.0%
n 54902
 
5.4%
c 42943
 
4.2%
t 41139
 
4.0%
Other values (16) 209685
20.6%
Uppercase Letter
ValueCountFrequency (%)
B 37938
16.5%
J 27069
11.8%
K 22977
10.0%
S 21805
9.5%
G 18285
8.0%
T 14115
 
6.1%
F 12780
 
5.6%
D 12281
 
5.3%
W 10518
 
4.6%
Y 10428
 
4.5%
Other values (13) 41720
18.1%
Decimal Number
ValueCountFrequency (%)
1 1748
84.1%
5 202
 
9.7%
6 64
 
3.1%
2 64
 
3.1%
7 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 5742
57.8%
& 4073
41.0%
. 126
 
1.3%
Space Separator
ValueCountFrequency (%)
125300
100.0%
Math Symbol
ValueCountFrequency (%)
+ 615
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 351
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1248559
90.0%
Common 138286
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 142984
 
11.5%
s 126069
 
10.1%
r 110268
 
8.8%
i 87155
 
7.0%
o 77105
 
6.2%
a 65051
 
5.2%
y 61342
 
4.9%
n 54902
 
4.4%
c 42943
 
3.4%
t 41139
 
3.3%
Other values (39) 439601
35.2%
Common
ValueCountFrequency (%)
125300
90.6%
/ 5742
 
4.2%
& 4073
 
2.9%
1 1748
 
1.3%
+ 615
 
0.4%
- 351
 
0.3%
5 202
 
0.1%
. 126
 
0.1%
6 64
 
< 0.1%
2 64
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1386845
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 142984
 
10.3%
s 126069
 
9.1%
125300
 
9.0%
r 110268
 
8.0%
i 87155
 
6.3%
o 77105
 
5.6%
a 65051
 
4.7%
y 61342
 
4.4%
n 54902
 
4.0%
c 42943
 
3.1%
Other values (50) 493726
35.6%
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105542
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowB
5th rowB
ValueCountFrequency (%)
a 26001
24.6%
d 15149
14.4%
f 12553
11.9%
h 12007
11.4%
i 9214
 
8.7%
g 8875
 
8.4%
c 6961
 
6.6%
b 6775
 
6.4%
j 4615
 
4.4%
s 3392
 
3.2%

Most occurring characters

ValueCountFrequency (%)
A 26001
24.6%
D 15149
14.4%
F 12553
11.9%
H 12007
11.4%
I 9214
 
8.7%
G 8875
 
8.4%
C 6961
 
6.6%
B 6775
 
6.4%
J 4615
 
4.4%
S 3392
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 105542
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 26001
24.6%
D 15149
14.4%
F 12553
11.9%
H 12007
11.4%
I 9214
 
8.7%
G 8875
 
8.4%
C 6961
 
6.6%
B 6775
 
6.4%
J 4615
 
4.4%
S 3392
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 105542
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 26001
24.6%
D 15149
14.4%
F 12553
11.9%
H 12007
11.4%
I 9214
 
8.7%
G 8875
 
8.4%
C 6961
 
6.6%
B 6775
 
6.4%
J 4615
 
4.4%
S 3392
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105542
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 26001
24.6%
D 15149
14.4%
F 12553
11.9%
H 12007
11.4%
I 9214
 
8.7%
G 8875
 
8.4%
C 6961
 
6.6%
B 6775
 
6.4%
J 4615
 
4.4%
S 3392
 
3.2%
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length30
Median length21
Mean length13.761725
Min length5

Characters and Unicode

Total characters1452440
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLadieswear
2nd rowLadieswear
3rd rowLadieswear
4th rowLingeries/Tights
5th rowLingeries/Tights
ValueCountFrequency (%)
sizes 30096
16.5%
ladieswear 26001
14.3%
children 25836
14.2%
divided 15149
8.3%
menswear 12553
6.9%
92-140 12007
 
6.6%
accessories 11576
 
6.4%
134-170 9214
 
5.1%
baby 8875
 
4.9%
50-98 8875
 
4.9%
Other values (4) 21743
12.0%

Most occurring characters

ValueCountFrequency (%)
e 196467
 
13.5%
i 155708
 
10.7%
s 123889
 
8.5%
r 90748
 
6.2%
d 89096
 
6.1%
a 85006
 
5.9%
76383
 
5.3%
w 47784
 
3.3%
n 45164
 
3.1%
L 39737
 
2.7%
Other values (31) 502458
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1025148
70.6%
Uppercase Letter 158604
 
10.9%
Decimal Number 150819
 
10.4%
Space Separator 76383
 
5.3%
Dash Punctuation 30096
 
2.1%
Other Punctuation 11390
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 196467
19.2%
i 155708
15.2%
s 123889
12.1%
r 90748
8.9%
d 89096
8.7%
a 85006
8.3%
w 47784
 
4.7%
n 45164
 
4.4%
h 32611
 
3.2%
z 30096
 
2.9%
Other values (10) 128579
12.5%
Decimal Number
ValueCountFrequency (%)
1 30435
20.2%
0 30096
20.0%
4 21221
14.1%
9 20882
13.8%
2 12007
 
8.0%
3 9214
 
6.1%
7 9214
 
6.1%
8 8875
 
5.9%
5 8875
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
L 39737
25.1%
S 38103
24.0%
C 25836
16.3%
D 15149
 
9.6%
M 12553
 
7.9%
A 11576
 
7.3%
B 8875
 
5.6%
T 6775
 
4.3%
Other Punctuation
ValueCountFrequency (%)
/ 6775
59.5%
, 4615
40.5%
Space Separator
ValueCountFrequency (%)
76383
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30096
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1183752
81.5%
Common 268688
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 196467
16.6%
i 155708
13.2%
s 123889
10.5%
r 90748
 
7.7%
d 89096
 
7.5%
a 85006
 
7.2%
w 47784
 
4.0%
n 45164
 
3.8%
L 39737
 
3.4%
S 38103
 
3.2%
Other values (18) 272050
23.0%
Common
ValueCountFrequency (%)
76383
28.4%
1 30435
 
11.3%
0 30096
 
11.2%
- 30096
 
11.2%
4 21221
 
7.9%
9 20882
 
7.8%
2 12007
 
4.5%
3 9214
 
3.4%
7 9214
 
3.4%
8 8875
 
3.3%
Other values (3) 20265
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1452440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 196467
 
13.5%
i 155708
 
10.7%
s 123889
 
8.5%
r 90748
 
6.2%
d 89096
 
6.1%
a 85006
 
5.9%
76383
 
5.3%
w 47784
 
3.3%
n 45164
 
3.1%
L 39737
 
2.7%
Other values (31) 502458
34.6%

index_group_no
Real number (ℝ)

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1715336
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile4
Maximum26
Range25
Interquartile range (IQR)3

Descriptive statistics

Standard deviation4.3532344
Coefficient of variation (CV)1.372596
Kurtosis21.330669
Mean3.1715336
Median Absolute Deviation (MAD)1
Skewness4.5875897
Sum334730
Variance18.95065
MonotonicityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%)
1 39737
37.7%
4 34711
32.9%
2 15149
 
14.4%
3 12553
 
11.9%
26 3392
 
3.2%
ValueCountFrequency (%)
1 39737
37.7%
2 15149
 
14.4%
3 12553
 
11.9%
4 34711
32.9%
26 3392
 
3.2%
ValueCountFrequency (%)
26 3392
 
3.2%
4 34711
32.9%
3 12553
 
11.9%
2 15149
 
14.4%
1 39737
37.7%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length13
Median length10
Mean length10.157473
Min length5

Characters and Unicode

Total characters1072040
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLadieswear
2nd rowLadieswear
3rd rowLadieswear
4th rowLadieswear
5th rowLadieswear
ValueCountFrequency (%)
ladieswear 39737
37.7%
baby/children 34711
32.9%
divided 15149
 
14.4%
menswear 12553
 
11.9%
sport 3392
 
3.2%

Most occurring characters

ValueCountFrequency (%)
e 154440
14.4%
a 126738
11.8%
d 104746
 
9.8%
i 104746
 
9.8%
r 90393
 
8.4%
s 52290
 
4.9%
w 52290
 
4.9%
n 47264
 
4.4%
L 39737
 
3.7%
C 34711
 
3.2%
Other values (13) 264685
24.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 897076
83.7%
Uppercase Letter 140253
 
13.1%
Other Punctuation 34711
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 154440
17.2%
a 126738
14.1%
d 104746
11.7%
i 104746
11.7%
r 90393
10.1%
s 52290
 
5.8%
w 52290
 
5.8%
n 47264
 
5.3%
l 34711
 
3.9%
h 34711
 
3.9%
Other values (6) 94747
10.6%
Uppercase Letter
ValueCountFrequency (%)
L 39737
28.3%
C 34711
24.7%
B 34711
24.7%
D 15149
 
10.8%
M 12553
 
9.0%
S 3392
 
2.4%
Other Punctuation
ValueCountFrequency (%)
/ 34711
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1037329
96.8%
Common 34711
 
3.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 154440
14.9%
a 126738
12.2%
d 104746
10.1%
i 104746
10.1%
r 90393
 
8.7%
s 52290
 
5.0%
w 52290
 
5.0%
n 47264
 
4.6%
L 39737
 
3.8%
C 34711
 
3.3%
Other values (12) 229974
22.2%
Common
ValueCountFrequency (%)
/ 34711
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1072040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 154440
14.4%
a 126738
11.8%
d 104746
 
9.8%
i 104746
 
9.8%
r 90393
 
8.4%
s 52290
 
4.9%
w 52290
 
4.9%
n 47264
 
4.4%
L 39737
 
3.7%
C 34711
 
3.2%
Other values (13) 264685
24.7%

section_no
Real number (ℝ)

Distinct57
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.664219
Minimum2
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum2
5-th percentile6
Q120
median46
Q361
95-th percentile77
Maximum97
Range95
Interquartile range (IQR)41

Descriptive statistics

Standard deviation23.260105
Coefficient of variation (CV)0.54518999
Kurtosis-1.1000683
Mean42.664219
Median Absolute Deviation (MAD)20
Skewness-0.084535432
Sum4502867
Variance541.03248
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15 7295
 
6.9%
53 7124
 
6.7%
44 4932
 
4.7%
76 4469
 
4.2%
77 3899
 
3.7%
61 3598
 
3.4%
79 3490
 
3.3%
11 3376
 
3.2%
46 3328
 
3.2%
66 3270
 
3.1%
Other values (47) 60761
57.6%
ValueCountFrequency (%)
2 2337
 
2.2%
4 3
 
< 0.1%
5 1894
 
1.8%
6 2725
 
2.6%
8 2266
 
2.1%
11 3376
3.2%
14 1270
 
1.2%
15 7295
6.9%
16 1581
 
1.5%
17 1
 
< 0.1%
ValueCountFrequency (%)
97 559
 
0.5%
82 682
 
0.6%
80 35
 
< 0.1%
79 3490
3.3%
77 3899
3.7%
76 4469
4.2%
72 2034
1.9%
71 26
 
< 0.1%
70 280
 
0.3%
66 3270
3.1%
Distinct56
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length30
Median length22
Mean length16.743069
Min length4

Characters and Unicode

Total characters1767097
Distinct characters48
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWomens Everyday Basics
2nd rowWomens Everyday Basics
3rd rowWomens Everyday Basics
4th rowWomens Lingerie
5th rowWomens Lingerie
ValueCountFrequency (%)
womens 33662
 
12.8%
17323
 
6.6%
kids 15153
 
5.8%
collection 14419
 
5.5%
divided 14275
 
5.4%
baby 10551
 
4.0%
girl 10128
 
3.9%
accessories 9735
 
3.7%
everyday 8876
 
3.4%
basics 8828
 
3.4%
Other values (49) 120028
45.6%

Most occurring characters

ValueCountFrequency (%)
e 182527
 
10.3%
157436
 
8.9%
s 142303
 
8.1%
i 130588
 
7.4%
o 123340
 
7.0%
n 99911
 
5.7%
r 93569
 
5.3%
a 92150
 
5.2%
l 72523
 
4.1%
d 67367
 
3.8%
Other values (38) 605383
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1336032
75.6%
Uppercase Letter 243540
 
13.8%
Space Separator 157436
 
8.9%
Other Punctuation 27562
 
1.6%
Math Symbol 2337
 
0.1%
Decimal Number 190
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 182527
13.7%
s 142303
10.7%
i 130588
9.8%
o 123340
9.2%
n 99911
 
7.5%
r 93569
 
7.0%
a 92150
 
6.9%
l 72523
 
5.4%
d 67367
 
5.0%
m 63470
 
4.8%
Other values (12) 268284
20.1%
Uppercase Letter
ValueCountFrequency (%)
W 33662
13.8%
B 30475
12.5%
C 29740
12.2%
S 22980
9.4%
D 17628
7.2%
M 15966
 
6.6%
K 15153
 
6.2%
E 14164
 
5.8%
G 13618
 
5.6%
T 8992
 
3.7%
Other values (11) 41162
16.9%
Other Punctuation
ValueCountFrequency (%)
& 22426
81.4%
, 5136
 
18.6%
Space Separator
ValueCountFrequency (%)
157436
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2337
100.0%
Decimal Number
ValueCountFrequency (%)
2 190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1579572
89.4%
Common 187525
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 182527
 
11.6%
s 142303
 
9.0%
i 130588
 
8.3%
o 123340
 
7.8%
n 99911
 
6.3%
r 93569
 
5.9%
a 92150
 
5.8%
l 72523
 
4.6%
d 67367
 
4.3%
m 63470
 
4.0%
Other values (33) 511824
32.4%
Common
ValueCountFrequency (%)
157436
84.0%
& 22426
 
12.0%
, 5136
 
2.7%
+ 2337
 
1.2%
2 190
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1767097
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 182527
 
10.3%
157436
 
8.9%
s 142303
 
8.1%
i 130588
 
7.4%
o 123340
 
7.0%
n 99911
 
5.7%
r 93569
 
5.3%
a 92150
 
5.2%
l 72523
 
4.1%
d 67367
 
3.8%
Other values (38) 605383
34.3%

garment_group_no
Real number (ℝ)

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010.4383
Minimum1001
Maximum1025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB

Quantile statistics

Minimum1001
5-th percentile1002
Q11005
median1009
Q31017
95-th percentile1020
Maximum1025
Range24
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.7310232
Coefficient of variation (CV)0.0066614886
Kurtosis-1.287045
Mean1010.4383
Median Absolute Deviation (MAD)6
Skewness0.31875162
Sum1.0664368 × 108
Variance45.306673
MonotonicityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1005 21445
20.3%
1019 11519
10.9%
1002 8126
 
7.7%
1003 7490
 
7.1%
1017 7441
 
7.1%
1009 6727
 
6.4%
1010 5838
 
5.5%
1020 5145
 
4.9%
1013 4874
 
4.6%
1007 4501
 
4.3%
Other values (11) 22436
21.3%
ValueCountFrequency (%)
1001 3873
 
3.7%
1002 8126
 
7.7%
1003 7490
 
7.1%
1005 21445
20.3%
1006 1965
 
1.9%
1007 4501
 
4.3%
1008 908
 
0.9%
1009 6727
 
6.4%
1010 5838
 
5.5%
1011 2116
 
2.0%
ValueCountFrequency (%)
1025 1559
 
1.5%
1023 1061
 
1.0%
1021 2272
 
2.2%
1020 5145
4.9%
1019 11519
10.9%
1018 2787
 
2.6%
1017 7441
7.1%
1016 3100
 
2.9%
1014 1541
 
1.5%
1013 4874
4.6%
Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size824.7 KiB

Length

Max length29
Median length17
Mean length10.951811
Min length5

Characters and Unicode

Total characters1155876
Distinct characters40
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJersey Basic
2nd rowJersey Basic
3rd rowJersey Basic
4th rowUnder-, Nightwear
5th rowUnder-, Nightwear
ValueCountFrequency (%)
jersey 29571
18.3%
fancy 21445
13.3%
accessories 11519
 
7.1%
trousers 9827
 
6.1%
basic 8126
 
5.0%
knitwear 7490
 
4.6%
under 7441
 
4.6%
nightwear 7441
 
4.6%
blouses 5838
 
3.6%
shoes 5145
 
3.2%
Other values (20) 47761
29.6%

Most occurring characters

ValueCountFrequency (%)
e 160751
13.9%
s 150245
13.0%
r 108764
 
9.4%
i 59052
 
5.1%
a 57461
 
5.0%
n 57297
 
5.0%
56062
 
4.9%
c 55942
 
4.8%
y 54946
 
4.8%
o 51000
 
4.4%
Other values (30) 344356
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 918164
79.4%
Uppercase Letter 161297
 
14.0%
Space Separator 56062
 
4.9%
Other Punctuation 12912
 
1.1%
Dash Punctuation 7441
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 160751
17.5%
s 150245
16.4%
r 108764
11.8%
i 59052
 
6.4%
a 57461
 
6.3%
n 57297
 
6.2%
c 55942
 
6.1%
y 54946
 
6.0%
o 51000
 
5.6%
t 32104
 
3.5%
Other values (13) 130602
14.2%
Uppercase Letter
ValueCountFrequency (%)
J 31536
19.6%
F 21445
13.3%
S 17735
11.0%
B 15929
9.9%
T 12099
 
7.5%
A 11519
 
7.1%
U 11314
 
7.0%
D 10423
 
6.5%
K 9455
 
5.9%
N 7441
 
4.6%
Other values (3) 12401
 
7.7%
Other Punctuation
ValueCountFrequency (%)
, 7441
57.6%
/ 5471
42.4%
Space Separator
ValueCountFrequency (%)
56062
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1079461
93.4%
Common 76415
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 160751
14.9%
s 150245
13.9%
r 108764
 
10.1%
i 59052
 
5.5%
a 57461
 
5.3%
n 57297
 
5.3%
c 55942
 
5.2%
y 54946
 
5.1%
o 51000
 
4.7%
t 32104
 
3.0%
Other values (26) 291899
27.0%
Common
ValueCountFrequency (%)
56062
73.4%
, 7441
 
9.7%
- 7441
 
9.7%
/ 5471
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1155876
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 160751
13.9%
s 150245
13.0%
r 108764
 
9.4%
i 59052
 
5.1%
a 57461
 
5.0%
n 57297
 
5.0%
56062
 
4.9%
c 55942
 
4.8%
y 54946
 
4.8%
o 51000
 
4.4%
Other values (30) 344356
29.8%
Distinct43404
Distinct (%)41.3%
Missing416
Missing (%)0.4%
Memory size824.7 KiB

Length

Max length764
Median length468
Mean length142.1619
Min length11

Characters and Unicode

Total characters14944912
Distinct characters98
Distinct categories14 ?
Distinct scripts2 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21430 ?
Unique (%)20.4%

Sample

1st rowJersey top with narrow shoulder straps.
2nd rowJersey top with narrow shoulder straps.
3rd rowJersey top with narrow shoulder straps.
4th rowMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
5th rowMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
ValueCountFrequency (%)
and 160065
 
6.4%
a 151693
 
6.0%
with 150703
 
6.0%
the 135045
 
5.4%
in 105374
 
4.2%
at 80688
 
3.2%
back 36807
 
1.5%
front 36244
 
1.4%
soft 35579
 
1.4%
waist 34284
 
1.4%
Other values (5000) 1586260
63.1%

Most occurring characters

ValueCountFrequency (%)
2407661
16.1%
e 1318549
 
8.8%
t 1241876
 
8.3%
a 1029247
 
6.9%
n 910904
 
6.1%
i 876105
 
5.9%
s 828095
 
5.5%
o 718446
 
4.8%
r 618196
 
4.1%
d 602822
 
4.0%
Other values (88) 4393011
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11792249
78.9%
Space Separator 2407661
 
16.1%
Other Punctuation 358067
 
2.4%
Uppercase Letter 234150
 
1.6%
Dash Punctuation 111399
 
0.7%
Decimal Number 35387
 
0.2%
Open Punctuation 2083
 
< 0.1%
Close Punctuation 2082
 
< 0.1%
Other Symbol 977
 
< 0.1%
Other Number 444
 
< 0.1%
Other values (4) 413
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1318549
11.2%
t 1241876
 
10.5%
a 1029247
 
8.7%
n 910904
 
7.7%
i 876105
 
7.4%
s 828095
 
7.0%
o 718446
 
6.1%
r 618196
 
5.2%
d 602822
 
5.1%
h 543005
 
4.6%
Other values (21) 3105004
26.3%
Uppercase Letter
ValueCountFrequency (%)
S 50792
21.7%
L 27632
11.8%
T 24427
 
10.4%
F 12014
 
5.1%
C 11379
 
4.9%
V 11244
 
4.8%
U 8860
 
3.8%
P 8723
 
3.7%
B 8245
 
3.5%
J 8091
 
3.5%
Other values (17) 62743
26.8%
Other Punctuation
ValueCountFrequency (%)
. 207391
57.9%
, 147684
41.2%
/ 2255
 
0.6%
& 412
 
0.1%
% 199
 
0.1%
: 57
 
< 0.1%
' 42
 
< 0.1%
" 12
 
< 0.1%
! 10
 
< 0.1%
? 5
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 8514
24.1%
3 4986
14.1%
1 4894
13.8%
4 4657
13.2%
2 3788
10.7%
0 2949
 
8.3%
8 1849
 
5.2%
6 1498
 
4.2%
7 1293
 
3.7%
9 959
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
- 109665
98.4%
1730
 
1.6%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
847
86.7%
® 117
 
12.0%
° 13
 
1.3%
Math Symbol
ValueCountFrequency (%)
+ 10
83.3%
> 1
 
8.3%
< 1
 
8.3%
Open Punctuation
ValueCountFrequency (%)
( 2082
> 99.9%
{ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2081
> 99.9%
} 1
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
384
97.0%
12
 
3.0%
Initial Punctuation
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
2407661
100.0%
Other Number
ValueCountFrequency (%)
½ 444
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12026399
80.5%
Common 2918513
 
19.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1318549
 
11.0%
t 1241876
 
10.3%
a 1029247
 
8.6%
n 910904
 
7.6%
i 876105
 
7.3%
s 828095
 
6.9%
o 718446
 
6.0%
r 618196
 
5.1%
d 602822
 
5.0%
h 543005
 
4.5%
Other values (48) 3339154
27.8%
Common
ValueCountFrequency (%)
2407661
82.5%
. 207391
 
7.1%
, 147684
 
5.1%
- 109665
 
3.8%
5 8514
 
0.3%
3 4986
 
0.2%
1 4894
 
0.2%
4 4657
 
0.2%
2 3788
 
0.1%
0 2949
 
0.1%
Other values (30) 16324
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14936558
99.9%
None 5372
 
< 0.1%
Punctuation 2134
 
< 0.1%
Letterlike Symbols 847
 
< 0.1%
Alphabetic PF 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2407661
16.1%
e 1318549
 
8.8%
t 1241876
 
8.3%
a 1029247
 
6.9%
n 910904
 
6.1%
i 876105
 
5.9%
s 828095
 
5.5%
o 718446
 
4.8%
r 618196
 
4.1%
d 602822
 
4.0%
Other values (71) 4384657
29.4%
None
ValueCountFrequency (%)
é 2476
46.1%
ê 2210
41.1%
½ 444
 
8.3%
® 117
 
2.2%
É 102
 
1.9%
° 13
 
0.2%
ñ 6
 
0.1%
à 3
 
0.1%
´ 1
 
< 0.1%
Punctuation
ValueCountFrequency (%)
1730
81.1%
384
 
18.0%
12
 
0.6%
4
 
0.2%
3
 
0.1%
1
 
< 0.1%
Letterlike Symbols
ValueCountFrequency (%)
847
100.0%
Alphabetic PF
ValueCountFrequency (%)
1
100.0%

Interactions

Correlations

article_idproduct_codeproduct_type_nographical_appearance_nocolour_group_codeperceived_colour_value_idperceived_colour_master_iddepartment_noindex_group_nosection_nogarment_group_no
article_id1.0001.000-0.040-0.000-0.042-0.0540.029-0.073-0.068-0.0420.010
product_code1.0001.000-0.040-0.000-0.042-0.0540.029-0.073-0.068-0.0420.010
product_type_no-0.040-0.0401.0000.0130.077-0.029-0.087-0.0110.0570.027-0.053
graphical_appearance_no-0.000-0.0000.0131.0000.0500.019-0.096-0.097-0.141-0.0260.058
colour_group_code-0.042-0.0420.0770.0501.0000.009-0.3390.0800.140-0.004-0.017
perceived_colour_value_id-0.054-0.054-0.0290.0190.0091.000-0.0380.007-0.022-0.0050.027
perceived_colour_master_id0.0290.029-0.087-0.096-0.339-0.0381.000-0.041-0.0840.000-0.024
department_no-0.073-0.073-0.011-0.0970.0800.007-0.0411.0000.7620.314-0.054
index_group_no-0.068-0.0680.057-0.1410.140-0.022-0.0840.7621.0000.249-0.124
section_no-0.042-0.0420.027-0.026-0.004-0.0050.0000.3140.2491.0000.182
garment_group_no0.0100.010-0.0530.058-0.0170.027-0.024-0.054-0.1240.1821.000

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

article_idproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_desc
0108775015108775Strap top253Vest topGarment Upper body1010016Solid9Black4Dark5Black1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
1108775044108775Strap top253Vest topGarment Upper body1010016Solid10White3Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
2108775051108775Strap top (1)253Vest topGarment Upper body1010017Stripe11Off White1Dusty Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
3110065001110065OP T-shirt (Idro)306BraUnderwear1010016Solid9Black4Dark5Black1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
4110065002110065OP T-shirt (Idro)306BraUnderwear1010016Solid10White3Light9White1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
5110065011110065OP T-shirt (Idro)306BraUnderwear1010016Solid12Light Beige1Dusty Light11Beige1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
611156500111156520 den 1p Stockings304Underwear TightsSocks & Tights1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny nylon stockings with a wide, reinforced trim at the top. Use with a suspender belt. 20 denier.
711156500311156520 den 1p Stockings302SocksSocks & Tights1010016Solid13Beige2Medium Dusty11Beige3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny nylon stockings with a wide, reinforced trim at the top. Use with a suspender belt. 20 denier.
8111586001111586Shape Up 30 den 1p Tights273Leggings/TightsGarment Lower body1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsTights with built-in support to lift the bottom. Black in 30 denier and light amber in 15 denier.
9111593001111593Support 40 den 1p Tights304Underwear TightsSocks & Tights1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny tights that shape the tummy, thighs and calves while also encouraging blood circulation in the legs. Elasticated waist.
article_idproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_desc
105532949594001949594LOGG Elvis jogger.272TrousersGarment Lower body1010016Solid8Dark Grey4Dark-1Unknown1919JerseyALadieswear1Ladieswear2H&M+1005Jersey FancyJoggers in soft sweatshirt fabric with an elasticated, drawstring waist, diagonal side pockets and slim legs with ribbed hems.
105533950449002950449Compact brush Fancy78Other accessoriesAccessories1010016Solid50Other Pink5Bright4Pink4313Girls Small Acc/BagsJChildren Accessories, Swimwear4Baby/Children43Kids Accessories, Swimwear & D1019AccessoriesSmall, folding hair brush with a rhinestone-decorated lid that has a mirror inside. Diameter 6.5 cm.
105534952267001952267Heavy plain overknee tights 1p304Underwear TightsSocks & Tights1010013Other pattern9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsFine-knit tights with an elasticated waist that are thinner at the top and more opaque at the bottom giving them the appearance of over-the-knee socks.
105535952937003952937Jets dress265DressGarment Full body1010001All over pattern13Beige2Medium Dusty1Mole1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyFitted, calf-length dress in viscose jersey with a stand-up collar and concealed zip at the back. Double layer at the top with wrapover, draped sections, close-fitting, extra-long sleeves and an asymmetric skirt with a high slit in one side. Lined.
105536952938001952938Elton top254TopGarment Upper body1010001All over pattern13Beige2Medium Dusty1Mole1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyFitted top in jersey with a round neckline and extra-long sleeves. Additional draped layer at the front.
1055379534500019534505pk regular Placement1302SocksSocks & Tights1010014Placement print9Black4Dark5Black7188Socks BinFMenswear3Menswear26Men Underwear1021Socks and TightsSocks in a fine-knit cotton blend with a small motif at the top and elasticated tops.
105538953763001953763SPORT Malaga tank253Vest topGarment Upper body1010016Solid9Black4Dark5Black1919JerseyALadieswear1Ladieswear2H&M+1005Jersey FancyLoose-fitting sports vest top in ribbed fast-drying functional fabric made from recycled polyester with a racer back and rounded hem.
105539956217002956217Cartwheel dress265DressGarment Full body1010016Solid9Black4Dark5Black1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyShort, A-line dress in jersey with a round neckline and V-shaped opening at the front with narrow ties. Long, voluminous raglan sleeves and wide cuffs with covered buttons.
105540957375001957375CLAIRE HAIR CLAW72Hair clipAccessories1010016Solid9Black4Dark5Black3946Small AccessoriesDDivided2Divided52Divided Accessories1019AccessoriesLarge plastic hair claw.
105541959461001959461Lounge dress265DressGarment Full body1010016Solid11Off White1Dusty Light9White1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyCalf-length dress in ribbed jersey made from a cotton blend. Low-cut V-neck at the back, dropped shoulders and long, wide sleeves that taper to the cuffs. Unlined.